Stream VByte: Faster Byte-Oriented Integer Compression
نویسندگان
چکیده
Arrays of integers are often compressed in search engines. Though there are many ways to compress integers, we are interested in the popular byte-oriented integer compression techniques (e.g., VByte or Google’s VARINT-GB). Although not known for their speed, they are appealing due to their simplicity and engineering convenience. Amazon’s VARINT-G8IU is one of the fastest byte-oriented compression technique published so far. It makes judicious use of the powerful single-instruction-multiple-data (SIMD) instructions available in commodity processors. To surpass VARINT-G8IU, we present STREAM VBYTE, a novel byte-oriented compression technique that separates the control stream from the encoded data. Like VARINT-G8IU, STREAM VBYTE is well suited for SIMD instructions. We show that STREAM VBYTE decoding can be up to twice as fast as VARINT-G8IU decoding over real data sets. In this sense, STREAM VBYTE establishes new speed records for byte-oriented integer compression, at times exceeding the speed of the memcpy function. On a 3.4GHz Haswell processor, it decodes more than 4 billion differentially-coded integers per second from RAM to L1 cache.
منابع مشابه
Vectorized VByte Decoding
We consider the ubiquitous technique of VByte compression, which represents each integer as a variable length sequence of bytes. The low 7 bits of each byte encode a portion of the integer, and the high bit of each byte is reserved as a continuation flag. This flag is set to 1 for all bytes except the last, and the decoding of each integer is complete when a byte with a high bit of 0 is encount...
متن کاملNew adaptive compressors for natural language text
Semistatic byte-oriented word-based compression codes have been shown to be an attractive alternative to compress natural language text databases, because of the combination of speed, effectiveness, and direct searchability they offer. In particular, our recently proposed family of dense compression codes has been shown to be superior to the more traditional byte-oriented word-based Huffman cod...
متن کاملUpscaledb: Efficient Integer-Key Compression in a Key-Value Store using SIMD Instructions
Compression can sometimes improve performance by making more of the data available to the processors faster. We consider the compression of integer keys in a B+-tree index. For this purpose, systems such as IBM DB2 use variable-byte compression over differentially coded keys. We revisit this problem with various compression alternatives such as Google’s VarIntGB, Binary Packing and Frame-of-Ref...
متن کاملBoosting Text Compression with Word-Based Statistical Encoding
Semistatic word-based byte-oriented compressors are known to be attractive alternatives to compress natural language texts. With compression ratios around 30-35%, they allow fast direct searching of compressed text. In this article we reveal that these compressors have even more benefits. We show that most of the state-of-the-art compressors benefit from compressing not the original text, but t...
متن کاملImpact-Based Document Retrieval
Two of the most important aspects contributing to the success of any document retrieval system are the query mechanism and the representation of its auxiliary operational data. The former greatly affects the quality of the retrieval results as well as the speed of the system. The latter reflects the ability of the system to represent its operational data in a compact form that reduces the stora...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- Inf. Process. Lett.
دوره 130 شماره
صفحات -
تاریخ انتشار 2018